This project we will focus on exploratory data analysis of stock prices. We'll focus on bank stocks and see how they progressed throughout the financial crisis all the way to early 2020.
In this exploratory data analysis, we will use pandas, numpy, datetime, matplotlib, seaborn, and pandas_datareader modules. These modules are preinstalled using pip.
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
%matplotlib inline
sns.set()
Using the pandas_datareader module, we will get the information for the following banks:
start = datetime.datetime(2006,1,1)
end = datetime.date.today()
bank_stocks = yf.download("BAC C COF JPM WFC KEY", start=start, end=end, group_by='ticker')
[*********************100%***********************] 6 of 6 completed
bank_stocks.columns.names = ['Bank Ticker', 'Stock Info']
ticker = ['BAC', 'C', 'COF', 'JPM', 'KEY', 'WFC']
bank_stocks.head()
| Bank Ticker | KEY | C | ... | JPM | WFC | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Stock Info | Open | High | Low | Close | Adj Close | Volume | Open | High | Low | Close | ... | Low | Close | Adj Close | Volume | Open | High | Low | Close | Adj Close | Volume |
| Date | |||||||||||||||||||||
| 2006-01-03 | 33.180000 | 33.680000 | 32.900002 | 33.669998 | 21.695471 | 1847100 | 490.000000 | 493.799988 | 481.100006 | 492.899994 | ... | 39.299999 | 40.189999 | 26.663109 | 12838600 | 31.600000 | 31.975000 | 31.195000 | 31.900000 | 20.523809 | 11016400 |
| 2006-01-04 | 33.540001 | 33.950001 | 33.459999 | 33.880001 | 21.830795 | 1815500 | 488.600006 | 491.000000 | 483.500000 | 483.799988 | ... | 39.419998 | 39.619999 | 26.509205 | 13491500 | 31.799999 | 31.820000 | 31.365000 | 31.530001 | 20.285755 | 10870000 |
| 2006-01-05 | 33.849998 | 34.090000 | 33.730000 | 34.040001 | 21.933891 | 2057000 | 484.399994 | 487.799988 | 484.000000 | 486.200012 | ... | 39.500000 | 39.740002 | 26.589512 | 8109400 | 31.500000 | 31.555000 | 31.309999 | 31.495001 | 20.263241 | 10158000 |
| 2006-01-06 | 34.040001 | 34.080002 | 33.689999 | 34.049999 | 21.940325 | 1290000 | 488.799988 | 489.000000 | 482.000000 | 486.200012 | ... | 39.549999 | 40.020000 | 26.776842 | 7966900 | 31.580000 | 31.775000 | 31.385000 | 31.680000 | 20.382265 | 8403800 |
| 2006-01-09 | 34.000000 | 34.150002 | 33.799999 | 34.090000 | 21.966105 | 997500 | 486.000000 | 487.399994 | 483.000000 | 483.899994 | ... | 39.880001 | 40.669998 | 27.211746 | 16575200 | 31.674999 | 31.825001 | 31.555000 | 31.674999 | 20.379047 | 5619600 |
5 rows × 36 columns
Maximum closing price:
for tick in ticker:
print(tick, bank_stocks[tick]['Close'].max().round(5))
BAC 54.9 C 564.09998 COF 166.28999 JPM 166.44 KEY 39.79 WFC 65.93
Minimum closing price:
bank_stocks.xs(key='Close', axis=1, level='Stock Info').min()
Bank Ticker KEY 4.60 C 10.20 COF 8.31 BAC 3.14 JPM 15.90 WFC 8.12 dtype: float64
returns = pd.DataFrame()
for tick in ticker:
returns[tick + ' Return'] = bank_stocks[tick]['Close'].pct_change()
Here is the brief overview of returns dataset.
returns.tail()
| BAC Return | C Return | COF Return | JPM Return | KEY Return | WFC Return | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2021-08-02 | -0.010428 | 0.001627 | -0.009895 | -0.004019 | -0.006104 | -0.003265 |
| 2021-08-03 | 0.015543 | 0.016979 | 0.008307 | 0.011378 | 0.024053 | 0.022712 |
| 2021-08-04 | -0.007263 | -0.005226 | -0.001487 | -0.010792 | -0.018991 | -0.014521 |
| 2021-08-05 | 0.019598 | 0.017075 | 0.012408 | 0.012629 | 0.013754 | 0.018202 |
| 2021-08-06 | 0.028960 | 0.019802 | 0.016913 | 0.028404 | 0.021608 | 0.037880 |
sns.pairplot(returns[1:])
<seaborn.axisgrid.PairGrid at 0x1b20aa6fe80>
Best single day returns:
returns.idxmin()
BAC Return 2009-01-20 C Return 2009-02-27 COF Return 2009-04-20 JPM Return 2009-01-20 KEY Return 2008-09-29 WFC Return 2009-01-20 dtype: datetime64[ns]
Worst single day returns:
returns.idxmax()
BAC Return 2009-04-09 C Return 2008-11-24 COF Return 2009-04-09 JPM Return 2009-01-21 KEY Return 2008-10-14 WFC Return 2008-07-16 dtype: datetime64[ns]
Standard Deviation of the returns over the time period.
returns.std()
BAC Return 0.032037 C Return 0.033917 COF Return 0.029581 JPM Return 0.024846 KEY Return 0.032525 WFC Return 0.027333 dtype: float64
Standard Deviation of the returns for the year 2019.
returns.loc['2019-01-01':'2019-12-31'].std()
BAC Return 0.014566 C Return 0.015354 COF Return 0.015220 JPM Return 0.011869 KEY Return 0.016226 WFC Return 0.012450 dtype: float64
Stock return with highest deviation is considered the riskiest because the price of that particular stock has higher volatality compared to the other.
sns.distplot(returns.loc['2015-01-01':'2015-12-31']['KEY Return'], color='green', bins=50 )
plt.title("KEY Return for the year 2015")
c:\users\nzrnm\appdata\local\programs\python\python39\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning)
Text(0.5, 1.0, 'KEY Return for the year 2015')
sns.distplot(returns.loc['2008-01-01':'2008-12-31']['C Return'], color='red', bins=50 )
plt.title("C Return for the year 2008")
c:\users\nzrnm\appdata\local\programs\python\python39\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning)
Text(0.5, 1.0, 'C Return for the year 2008')
Now we will import plotly and cufflinks as additional visualization modules.
sns.set_style('whitegrid')
import plotly.express as px
import plotly.graph_objects as go
The plot below shows the Closing price for each stock from 2006 to 2016. For the first graph, we use pandas built in plotting library while the second graph we use the plotly library.
for tick in ticker:
bank_stocks.loc['2006-01-01':'2015-12-31'][tick]['Close'].plot(label=tick, figsize=(16,6))
plt.legend()
<matplotlib.legend.Legend at 0x1b21204aca0>
px.line(bank_stocks.loc['2006-01-01':'2015-12-31'].xs(key = 'Close', axis=1, level='Stock Info'), template="plotly_dark")
bank_stocks['BAC']
| Stock Info | Open | High | Low | Close | Adj Close | Volume |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2006-01-03 | 46.919998 | 47.180000 | 46.150002 | 47.080002 | 34.280807 | 16296700 |
| 2006-01-04 | 47.000000 | 47.240002 | 46.450001 | 46.580002 | 33.916729 | 17757900 |
| 2006-01-05 | 46.580002 | 46.830002 | 46.320000 | 46.639999 | 33.960411 | 14970700 |
| 2006-01-06 | 46.799999 | 46.910000 | 46.349998 | 46.570000 | 33.909451 | 12599800 |
| 2006-01-09 | 46.720001 | 46.970001 | 46.360001 | 46.599998 | 33.931290 | 15619400 |
| ... | ... | ... | ... | ... | ... | ... |
| 2021-08-02 | 38.400002 | 38.950001 | 37.910000 | 37.959999 | 37.959999 | 49894000 |
| 2021-08-03 | 38.060001 | 38.630001 | 37.549999 | 38.549999 | 38.549999 | 39680600 |
| 2021-08-04 | 38.029999 | 38.770000 | 37.860001 | 38.270000 | 38.270000 | 42289800 |
| 2021-08-05 | 38.570000 | 39.060001 | 38.549999 | 39.020000 | 39.020000 | 34754500 |
| 2021-08-06 | 39.470001 | 40.410000 | 39.470001 | 40.150002 | 40.150002 | 58000800 |
3926 rows × 6 columns
plt.figure(figsize=(16,6))
bank_stocks['BAC']['Close'].loc['2008-01-01':'2008-12-31'].rolling(window=30).mean().plot(label='30 day Moving Average')
bank_stocks['BAC']['Close'].loc['2008-01-01':'2008-12-31'].plot(label='BAC Close')
plt.legend()
<matplotlib.legend.Legend at 0x1b212056ac0>
sns.heatmap(bank_stocks.xs(key='Close', axis=1, level='Stock Info').corr(), annot=True)
<AxesSubplot:xlabel='Bank Ticker', ylabel='Bank Ticker'>
sns.clustermap(bank_stocks.xs(key='Close', axis=1, level='Stock Info').corr(), annot=True)
<seaborn.matrix.ClusterGrid at 0x1b21457f520>
BofA = bank_stocks.loc['2015-01-01':'2016-01-01']['BAC']
fig = go.Figure(data=[go.Candlestick(x=BofA.index,
open=BofA['Open'],
high=BofA['High'],
low=BofA['Low'],
close=BofA['Close'])])
fig.show()